deep learning and tree search
Reviews: Thinking Fast and Slow with Deep Learning and Tree Search
SUMMARY: The paper proposes an algorithm that combines imitation learning with tree search, which results in an apprentice learning from an ever-improving expert. A DQN is trained to learn a policy derived from an MCTS agent, with the DQN providing generalisation to unseen states. It is also then used as feedback to improve the expert, which can then be used to retrain the DQN, and so on. The paper makes two contributions: (1) a new target for imitation learning, which is empirically shown to outperform a previously-suggested target and which results in the apprentice learning a policy of equal strength to the expert. COMMENTS: I found the paper generally well-written, clear and easy to follow, barring Section 6.
Thinking Fast and Slow with Deep Learning and Tree Search
Anthony, Thomas, Tian, Zheng, Barber, David
Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. Planning new policies is performed by tree search, while a deep neural network generalises those plans. Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans. In contrast, standard deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MoHex1.0, the most recent Olympiad Champion player to be publicly released.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)
Learning From Scratch by Thinking Fast and Slow with Deep Learning and Tree Search
According to dual process theory human reasoning consists of two different kinds of thinking. System 1 is a fast, unconscious and automatic mode of thought, also known as intuition. System 2 is a slow, conscious, explicit and rule-based mode of reasoning that is believed to be an evolutionarily recent process. When learning to complete a challenging planning task, such as playing a board game, humans exploit both processes: strong intuitions allow for more effective analytic reasoning by rapidly selecting interesting lines of play for consideration. Repeated deep study gradually improves intuitions.
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Learning From Scratch by Thinking Fast and Slow with Deep Learning and Tree Search
According to dual process theory human reasoning consists of two different kinds of thinking. System 1 is a fast, unconscious and automatic mode of thought, also known as intuition. System 2 is a slow, conscious, explicit and rule-based mode of reasoning that is believed to be an evolutionarily recent process. When learning to complete a challenging planning task, such as playing a board game, humans exploit both processes: strong intuitions allow for more effective analytic reasoning by rapidly selecting interesting lines of play for consideration. Repeated deep study gradually improves intuitions.
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)